Third-order motifs are sufficient to fully and uniquely characterize spatiotemporal neural network activity

Neuroscientific analyses balance between capturing the brain’s complexity and expressing that complexity in meaningful and understandable ways. Here we present a novel approach that fully characterizes neural network activity and does so by uniquely transforming raw signals into easily interpretable and biologically relevant metrics of network behavior. We first prove that third-order (triple) correlation describes network activity in its entirety using the triple correlation uniqueness theorem. Triple correlation quantifies the relationships among three events separated by spatial and temporal lags, which are triplet motifs. Classifying these motifs by their event sequencing leads to fourteen qualitatively distinct motif classes that embody well-studied network behaviors including synchrony, feedback, feedforward, convergence, and divergence. Within these motif classes, the summed triple correlations provide novel metrics of network behavior, as well as being inclusive of commonly used analyses. We demonstrate the power of this approach on a range of networks with increasingly obscured signals, from ideal noiseless simulations to noisy experimental data. This approach can be easily applied to any recording modality, so existing neural datasets are ripe for reanalysis. Triple correlation is an accessible signal processing tool with a solid theoretical foundation capable of revealing previously elusive information within recordings of neural networks.


Uniqueness of Triple-Correlation for Network Spiking Activity
We represent neuronal activity as a typical two-dimensional raster, x(n, t), where n is neuron location and t is time. We assume that the raster is a binary (black and white, cf. Fig. 1A), meaning x(n, t) = 1 if neuron n fires at time t, and x(n, t) = 0 otherwise. We note that the reasoning in the proofs below works just as well for any raster taking bounded values (13) analogous to a greyscale image, such as would be the case with local field potential recordings or the electroencephalogram. Indeed the reasoning below applies to any finite bounded dataset.
This relationship between triple correlation and the bispectrum is the third-order equivalent of the Wiener-Khinchin-Einstein theorem.
Since images have a finite support and all of the above integration limits are implicitly at (−∞, ∞), we can modify the results for finite support by multiplying the spatiotemporal domain data by a two-dimensional boxcar window, w, limited between (−Σ, Σ, −Ω, Ω) (or any other window with that support): x b = x w. The frequency domain results are then characterized by the Fourier transforms convolved (denoted by ⊛) with the boxcar's Fourier transform (W) (or that of the window applied) denoted by: X B = X ⊛W. By using this notation, the results in equations (1)-(5) would be adapted by adding the b and B subscripts. Yellott and Iverson (13) show in a constructive proof that a finite image (in our case a spike raster of a finite size) can be uniquely reconstructed from its third-order correlation. These authors also show and discuss how this does not hold for images of infinite size. Here we do not further discuss this aspect because the size of a spike raster (or a snapshot of any modality of neural activity) is always finite.
One critically important message of this paper is that the time domain's triple correlation and the corresponding bispectrum in the frequency domain uniquely determine the firing pattern of a network. Yellott (14) presents this as the TCU theorem. If we apply Yellott's TCU theorem to a spike raster, we get the following. Theorem 1. If x(n, t) is a raster with bounded support and another raster y(n, t) has the same triple correlation function as that of x, then y(n, t) = x(n + a, t + b) for a pair of constants a, b.
Theorem 2. If x(s) is a raster with bounded support in N spatial dimensions and another raster y(s) has the same triple correlation function as that of x, then y(s) = x(s + a) for a vector of constants a.
Here we present proofs in line with the proof in Yellott and Iverson's (13) for two-and N-dimensional spatiotemporal data.

Proof of Theorem 1 (two dimensions)
Given the equality of third-order correlation functions, we can use Equation (5) to find that To borrow some convenient results from probability theory (see any introductory text, e.g. (16)), we note that X and Y can be considered characteristic functions since we can consider x and y probability distributions: as finite images, x and y are bounded and nonnegative, and without loss of generality we can normalize them such that their integral is 1. Characteristic functions (the Fourier transforms of probability distributions) have two properties that are convenient for our purposes, the first of which is that they are non-zero in a region around the origin, thus allowing us the following division (for further considerations of rigor in the one dimensional case, cf. (46)): Next we rewrite both complex functions in terms of their amplitude and phase, i.e. X(σ, ω) = |X(σ, ω)|e jϕ(X(σ,ω)) , where ϕ gives the phase. So then we can rewrite the right-hand side of (7) Here we use the second convenient property of characteristic functions, namely that they are Hermitian, i.e. X(−σ, −ω) = X * (σ, ω). By setting σ 2 = 0, ω 2 = 0 in (8), we can use this Hermitian property to derive the fact that |X(σ 1 , ω 1 )| 2 = |Y(σ 1 , ω 1 )| 2 for any σ 1 , ω 1 . In particular, We can also rewrite those same terms thanks to the same Hermitian property.
This holds in a region near the origin, and another property of characteristic functions is that, if their probability distribution has finite support, the probability distribution is uniquely determined by the value of the characteristic function in a region of the origin. Thus, in the spatiotemporal domain, y(n, t) = x(n + a, t + b) with constants a, b.
Proof of Theorem 2 (N dimensions) Here we extend the preceding proof to N dimensions. This would be necessary for analysing many clinical and experimental data sources, e.g. the plane of a two-dimensional micro-electrode array produces 2 + 1 dimensional data (as in our example presented in Fig. 5). Even higher dimensions may be useful in the case where long-range connections can be represented as a hidden dimension (e.g. orientation tuning in V1). The proof is generic enough that in practice the result applies to all finite datasets. Note that the number of motifs (i.e. the size of the triple correlation matrix) scales with exponent 2N + 2.
Since the properties of characteristic functions that are key to this proof still hold in N dimensions (see any introductory probability theory text, e.g. (16)), the proof proceeds identically to that in two dimensions, but with vector notation.
Given the equality of third-order correlation functions, we can use Equation (5) to find that To borrow some convenient results from probability theory, we note that X and Y can be considered characteristic functions since we can consider x and y probability distributions: as finite images, x and y are bounded and nonnegative, and without loss of generality we can normalize them such that their integral is 1. Characteristic functions (the Fourier transforms of probability distributions) have two properties that are convenient for our purposes, the first of which is that they are non-zero in a region around the origin, thus allowing us the following division (for further considerations of rigor in the one dimensional case, cf. (46)): Next we rewrite both complex functions in terms of their amplitude and phase, i.e. X(Ω) = |X(Ω)|e jϕ(X(Ω)) , where ϕ gives the phase. So then we can rewrite the right-hand side of (18) Here we use the second convenient property of characteristic functions, namely that they are Hermitian, i.e. X(−Ω) = X * (Ω). By setting Ω 2 = 0 in (19), we can use this Hermitian property to derive the fact that |X(Ω 1 )| 2 = |Y(Ω 1 )| 2 for any Ω 1 . In particular, We can also rewrite those same terms thanks to the same Hermitian property.
Simple complex properties are |X| = |X * | and ϕ(X) = −ϕ(X * ), which give us Next, we return from the amplitude and phase notation to the complex functions themselves: Now we define H(Ω) = X(Ω) Y(Ω) so we can rewrite (23) as Therefore by a basic result of complex analysis H(Ω) = e jk·Ω , and thus This holds in a region near the origin, and another property of characteristic functions is that, if their probability distribution has finite support, the probability distribution is uniquely determined by the value of the characteristic function in a region of the origin. Thus, in the spatiotemporal domain, Computing the triple correlation For computational applications presented here, we consider third-order correlation analysis applied to a discrete raster with neuron rows n, 1:N, and time columns t, 1:T, where each pixel r(n, t) is filled with a 0 (white for no activity) or 1 (black for spike). The complete triple correlation function, while formally requisite, incurs substantial computational cost while adding increasingly noisy information. We limit our computation to lag windows −W t : W t and −W n : W n chosen such that two local synaptic connections could not take longer than W t , and two neurons are unlikely to be connected further than W n . With this restricted lag window, we do not zero pad our raster (as in (13)) but instead use boundary conditions periodic in space (since our spatial ordering was already arbitrary 1 ) and restrict our calculation to a subset of time such that no motifs extend beyond time zero or the raster's duration. In this case, discrete equivalent cd 3 of the triple correlation expression c 3 (Equation 1) or the short form reported in the main text, c 3 (n 1 , t 1 , n 2 , t 2 ) = ⟨r(n, t)r(n + n 1 , t + t 1 )r(n + n 2 , t + t 2 )⟩ n,t , is We scale by the number of spike bins in the summation Equations (27) and (29) were used in all our simulations. Note that in the text we report triple correlations "using lags up to X in time and space." This corresponds to W n = X/2 and W t = X/2. When X is odd, we use the more general lag windows −floor(X/2) : ceil(X/2), with corresponding changes to the summation bounds and scaling instead by (T − X)(N). We report the maximum lag as X because, e.g., cd 3 (−10, t 1 , 10, t 2 ) has a maximum temporal separation between spikes of 20, despite the fact that technically the lag arguments are at most 10.

Summarizing triple correlation into motif classes
We summarized the triple correlation as fourteen motif-classes (Fig. 1). The main text discusses the reasoning for this particular choice, and here we describe the details.
Let [n 1 , t 1 |n 2 , t 2 ] denote a motif, i.e. an argument to triple correlation, which would have a value c 3 (n 1 , t 1 , n 2 , t 2 ). We want to group together these motifs according to what we can qualitatively distinguish. We only hold two qualitative distinctions: 1. temporal: we distinguish between before, simultaneous, and after (temporal coordinate less than, equal to, or greater) 2. spatial: we distinguish between the same and different (spatial coordinate equal and spatial coordinate not equal) We will interpret the motif as a three node graph: one base node implicitly (0,0), and two other nodes, (n 1 , t 1 ) and (n 2 , t 2 ). Note that neither of our qualitative distinctions involve node identity. So, for example, we would not distinguish between [0, 1|0, 2] and [0, 2|0, 1], even though one the first node follows the second, and in the other the reverse is true. In both, the motif includes nodes at the same spatial coordinate in sequence, so we do not distinguish between them.
By grouping motifs that are indistinguishable under these criteria, we develop motif classes. We enumerate these motif classes in Figure 1. From that enumeration, it is clear that no motif can belong to two motif classes. To show that this enumeration is complete (there are no more motif classes and all motifs fall into one of these classes), we count the number of motif classes below. To begin, we count "lag-sign" motifs, which is to say, if we only care about the sign of the lags, how many motifs are there. This first step avoids some of the small complications in our definition, i.e. the different treatments of space and time, as well as the subtle ways they interact when we discard node identity. Having counted lag-sign motifs, we then reduce these lag-sign motifs by discounting spatial direction and node identity to finally arrive at the number of motif classes.
The number of lag-sign motifs Given that three node combinations determine c 3 , we want to know how many ways there are to order them in time and space, with the caveat that we may also place one node on top of another in either time, or space, or both (to account for zero lag). To count this, we divide the cases according to how many different temporal and spatial bins the nodes have between them. For example, if all the nodes are completely distinct in time and space, then there are both three spatial bins and three temporal bins between the three nodes. On the other hand, if we overlap all three nodes in both time and space, then there is only one spatial bin and one temporal bin between them. All the cases are labeled in Table S1. Note that the cases are symmetric, so we will calculate the values as if in the lower left half of the table, i.e. where the number of spatial bins is greater than or equal to the number of time bins. The arrangement of this table is the same as in Figure 1C. As described above, we call this arrangement of nodes separated by lags a motif. When we only note the sign of the lags, we call it a lag-sign motif. B. Given that the nodes only have two distinct spatial bins between them, there are three ways to choose which nodes share the same spatial bin, and then two ways to choose whether the remaining node has a spatial bin before or after the identical nodes' spatial bin, for 3 · 2 = 6 spatial orderings. As above, if no time bins are identical, then there are 3! temporal orderings. So there are 3 · 2 · 3! = 36 orderings.
C. If nodes share the same spatial bin, then there is only one way to order that neuron. As above, if no time bins are identical then there are 3! temporal orderings, giving a total of 1 · 3! = 6 orderings. The same is true in the symmetric case when the nodes share the same time bin, but all have different neurons.
D. We already established that for the case where two spatial bins are the same, there are 3 · 2 = 6 orderings. The same is true when the nodes only have two distinct time bins. So there are 3 · 2 · 3 · 2 = 36 orderings.
E. We already established that for the case where two spatial bins are the same, there are 3 · 2 = 6 orderings, and for the case where all time bins are the same, there is only one ordering. So there are 3 · 2 · 1 = 6 orderings.
F. In the case where all spatial bins and all time bins are the same, there is only one ordering.

The number of motif classes
To complete our summary, we reduce the 169 lag-sign motifs listed in Table S2 by appealing to two symmetries: one in space, and one in permutation. The spatial one is straightforward: since we have not assigned meaning to space, reordering the spatial bins does not change our interpretation. The permutation symmetry is more technical: the way triple correlation is calculated, the ordering of the nodes matters. In second-order correlation (call it ρ), this is even symmetry: ρ(x) = ρ(−x). More generally, a motif has some polygonal structure (or possibly a point or a line; treat these as straightforward special cases of the following argument). Each vertex of the polygon has a node identity: in the triple correlation case, one is the base node, one is the first node, given by the base node plus (n 1 , t 1 ), and one is the second node, given by the base node plus (n 2 , t 2 ). But we don't actually care which is the base, which is the first, and which is the second node: triple correlation is invariant under permutation of these nodes. We reduce the number of lag-sign motifs by accounting for these symmetries. Again working from Table S1: A. There are 3! ways to permute node identity, and 3! ways to permute neuron identity, independently, giving a product of 36 permutations. So all 36 lag-sign motifs in category A reduce to 1 motif class (Fig. 1C, XIII). Table S1: The number of possible orderings in space and time given two possible time lags and two possible spatial lags, which corresponds to up to three possible bins in both time and space. Note that in case of a spike raster of single unit activity, each spatial bin corresponds to one neuron. This table corresponds to Fig. 1C in the main text.
B(2T). For category B with two time bins, there are 3! node permutations, and only 3 spatial permutations: two spatial bins share a time bin, and the third spatial bin may appear above, between, or below these two spatial bins (due to the fact that we are reducing from lag-sign motifs, not from lag-motifs). The product is 18 total permutations, so the 36 lag-sign motifs in category B(2T) reduce to 2 motif classes (Fig. 1C, XI and XII).
B(2S). For category B with two spatial bins, there are only two spatial permutations: one spatial bin has two time bins, and the remaining spatial bin can either be above or below. There are still 3! node permutations, leading to a total of 12 permutations. Therefore the 36 lag-sign motifs in category B(2S) reduce to 3 motif classes (Fig.  1C, VIII, IX, and X).
D. By the same argument as in B(2S), there are two spatial permutations, and again 3! node permutations, leading to a total of 12 permutations. Therefore the 36 lag-sign motifs in category D reduce to 3 motif classes (Fig. 1C, V, VI, and VII).
C(1T). When there is only one time bin, spatial permutation is the same as node permutation, so there are only the 3! = 6 permutations. Therefore the 6 lag-sign motifs in category C(1T) reduce to 1 motif class (Fig. 1C, IV). E(1T). As in C(1T), all 6 reduce to 1 motif class (Fig. 1C, III).
When there is only one spatial bin, the lag-sign motifs have no spatial permutations, only node permutations, so the 6 lag-sign motifs in category C(1S) reduce to 1 motif class (Fig. 1C, II). E(1S). As in C(1S), all 6 reduce to 1 motif class (Fig. 1C, I) F. In the case where all lags are zero, triple correlation has no symmetric entries but the zero entry, so there are no permutations, and the 1 lag-sign motif (which also contains only one lag-motif) remains 1 motif class (Fig. 1C, 0).
The above sum to 14 motif classes, as depicted in Figure 1.

Expected contributions per motif class
Because it is important to asses how these motif-class contributions be explained by chance (or not), we simulate 100 activity matched noise rasters for each raster under investigation, and compare the raster's motif-class spectrum with that of its noise matched rasters. In the following, we derive the theoretical expectations for rasters governed by noise. Let r be a raster with N neurons and T time bins. In a binary raster, a spike is indicated by r(n, t) = 1, else r(n, t) = 0. We calculate the triple correlation c 3 over the whole raster with periodic boundary conditions (i.e. r(n+η, t+τ) = r((n+η) mod N, (t+τ) mod T ), with an additional 1 offset). Define π(x, λ) = r(x)r(x + λ 1 )r(x + λ 2 ) and for shorthand π(λ) = π(0, λ). In general with lags λ = (η 1 , τ 1 , η 2 , τ 2 ) where I(r) is the set of all indices of r, in this case I(r) = (1 : N) × (1 : T ). We are interested in the motif class contributions, which are the sums of triple correlations for lags that constitute each of the fourteen motifs. So, if λ ∈ ∆(M i ) denotes that the lags λ constitute a triplet in motif class i, then we define the motif class contribution as The expectation then is given by

⟨π(λ)⟩
In fact, ⟨π(λ)⟩ is constant for all λ ∈ ∆(M i ) for a given i, so We can easily calculate ⟨π(λ)⟩ if we assume independent bins. See below for the calculation in each of the cases of independent Bernoullis. #(∆(M i )) is the more challenging problem, which we calculate case-by-case below in the section "The number of triplet motifs."

Independent spiking with constant probability (Bernoulli)
For a simple spiking raster, we assume all bins have independent probability p of spiking, i.e. they are all independent Bernoullis.

The number of triplet motifs
The quantity #(∆(M i )) simply counts the number of motifs in a given motif class. This is independent of the data's distribution, but dependent on the data's shape. Let Λ t be the number of time lags considered, and let Λ ± t , Λ + t , and Λ − t be the number of time lags that are nonzero, positive, and negative respectively. Similarly define Λ n , Λ ± n , Λ + n , and Λ − n . Our first step for each motif class will be to define ∆(M i ), for which purpose we will define a notation to indicate the sign and equality of each part of the two spatiotemporal lags (the third lag implicitly zero). For example, in the case of motif class XIII, we write ∆(M XIII ) = [x ± , t ± | y ± , s ± ]. This indicates that all four lags are nonzero (± superscript) and none are equal (x ̸ = y and t ̸ = s). In contrast, we could write [0, t + | 0, t + ] to indicate both spatial lags are zero, and the temporal lags are both positive and equal. We add a coefficient "2" to indicate that we want to additionally include set of lags symmetric with the notated lag, i.e. flipping which lag is first and which is second, e.g.

I Motif class I consists in lags
We count these as follows: there is only one way to choose the (0, 0) lag, and Λ ± t ways to choose a (0, t ± ) lag. Therefore there are 1 · Λ ± t lags in [0, 0 | 0, t ± ]. By symmetry there are the same number in [0, t ± | 0, 0], so the coefficient 2 works as advertised, giving 2Λ ± second lag, leading to a Λ ± t lags in [0t ± | 0t ± ]. In total then, there are 3Λ ± t lags in ∆(M I ) = 2[0, 0 | 0, t ± ] + [0t ± | 0t ± ]. We write this This time for our second lag, we have one fewer choice because we must have that the temporal lags do not equal. So we find III For motif class III, the situation is the same as motif class I, but with n rather than IV Similarly motif class IV has #(∆(M III )) = Λ ± n (Λ ± n − 1). V VII By symmetry with VI,  (2 4 ) groups while the subgroups within these groups, represent the arrangement of the non-zero lags. The total number of rows is 169 lag-sign motifs for three-node motifs with distinct lag signs (Table S1). Each row represents a single three-node lag-sign motif, which are differentiated by the spike sequencing. The base node implicitly has lags n 0 = 0 and t 0 = 0, and the four lags of the other two nodes (n 1 , n 2 , t 1 , t 2 ) are those in Equation (27). In the lag columns, '0' entries denote that the lag is zero while '+' entries denote a positive value, and '-' entries denote a negative value (respectively greater or less relative to the base node's spatiotemporal position). Note that in case where one modality (space or time) shares the same (non-zero) sign, we need additional constraints to specify the spike sequence, given by the column "Constraints.